Search CORE

72 research outputs found

Java-ML: a machine learning library

Author: Abeel Thomas
Saeys Yvan
Van de Peer Yves
Publication venue
Publication date: 01/01/2009
Field of study

Java-ML is a collection of machine learning and data mining algorithms, which aims to be a readily usable and easily extensible API for both software developers and research scientists. The interfaces for each type of algorithm are kept simple and algorithms strictly follow their respective interface. Comparing different classifiers or clustering algorithms is therefore straightforward, and implementing new algorithms is also easy. The implementations of the algorithms are clearly written, properly documented and can thus be used as a reference. The library is written in Java and is available from http://java-ml.sourceforge.net/ under the GNU GPL license

Ghent University Academic Bibliography

Archivsystem Ask23

Event based text mining for integrated network construction

Author: Saeys Yvan
Van de Peer Yves
Van Landeghem Sofie
Publication venue: Microtome Publishing
Publication date: 01/01/2010
Field of study

The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the networ

Ghent University Academic Bibliography

Extracting protein-protein interactions from text using rich feature vectors and feature selection

Author: De Baets Bernard
Saeys Yvan
Van de Peer Yves
Van Landeghem Sofie
Publication venue: Turku Centre for Computer Sciences (TUCS)
Publication date: 01/01/2008
Field of study

Because of the intrinsic complexity of natural language, automatically extracting accurate information from text remains a challenge. We have applied rich featurevectors derived from dependency graphs to predict protein-protein interactions using machine learning techniques. We present the first extensive analysis of applyingfeature selection in this domain, and show that it can produce more cost-effective models. For the first time, our technique was also evaluated on several large-scalecross-dataset experiments, which offers a more realistic view on model performance. During benchmarking, we encountered several fundamental problems hindering comparability with other methods. We present a set of practical guidelines to set up ameaningful evaluation. Finally, we have analysed the feature sets from our experiments before and after feature selection, and evaluated the contribution of both lexical and syntacticinformation to our method. The gained insight will be useful to develop better performing methods in this domain

Ghent University Academic Bibliography

Benchmarking machine learning techniques for the extraction of protein-protein interactions from text

Author: De Baets Bernard
Saeys Yvan
Van de Peer Yves
Van Landeghem Sofie
Publication venue: Université de Liège
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

Coordinated functional divergence of genes after genome duplication in Arabidopsis thaliana

Author: De Smet Riet
Li Zhen
Sabaghian Ehsan
Saeys Yvan
Van de Peer Yves
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2017
Field of study

Gene and genome duplications have been rampant during the evolution of flowering plants. Unlike small-scale gene duplications, whole-genome duplications (WGDs) copy entire pathways or networks, and as such create the unique situation in which such duplicated pathways or networks could evolve novel functionality through the coordinated sub-or neofunctionalization of its constituent genes. Here, we describe a remarkable case of coordinated gene expression divergence following WGDs in Arabidopsis thaliana. We identified a set of 92 homoeologous gene pairs that all show a similar pattern of tissue-specific gene expression divergence following WGD, with one homoeolog showing predominant expression in aerial tissues and the other homoeolog showing biased expression in tip-growth tissues. We provide evidence that this pattern of gene expression divergence seems to involve genes with a role in cell polarity and that likely function in the maintenance of cell wall integrity. Following WGD, many of these duplicated genes evolved separate functions through subfunctionalization in growth/development and stress response. Uncoupling these processes through genome duplications likely provided important adaptations with respect to growth and morphogenesis and defense against biotic and abiotic stress

Ghent University Academic Bibliography

Selecting relevant features for gene structure prediction

Author: Aeyels Dirk
Degroeve Sven
Rouzé Pierre
Saeys Yvan
Van de Peer Yves
Publication venue: VUB Press
Publication date: 01/01/2004
Field of study

Ghent University Academic Bibliography

Feature selection for splice site prediction: A new method using EDA-based feature ranking

Author: Aeyels Dirk
Degroeve Sven
Rouzé Pierre
Saeys Yvan
Van de Peer Yves
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

Large-scale structural analysis of the core promoter in mammalian and plant genomes

Author: Degroeve Sven
Florquin Kobe
Rouzé Pierre
Saeys Yvan
Van de Peer Yves
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

DNA encodes at least two independent levels of functional information. The first level is for encoding proteins and sequence targets for DNA-binding factors, while the second one is contained in the physical and structural properties of the DNA molecule itself. Although the physical and structural properties are ultimately determined by the nucleotide sequence itself, the cell exploits these properties in a way in which the sequence itself plays no role other than to support or facilitate certain spatial structures. In this work, we focus on these structural properties, comparing them between different organisms and assessing their ability to describe the core promoter. We prove the existence of distinct types of core promoters, based on a clustering of their structural profiles. These results indicate that the structural profiles are much conserved within plants (Arabidopsis and rice) and animals (human and mouse), but differ considerably between plants and animals. Furthermore, we demonstrate that these structural profiles can be an alternative way of describing the core promoter, in addition to more classical motif or IUPAC-based approaches. Using the structural profiles as discriminatory elements to separate promoter regions from non-promoter regions, reliable models can be built to identify core-promoter regions using a strictly computational approach

CiteSeerX

Ghent University Academic Bibliography

PubMed Central

GenomeView : a next-generation genome browser

Author: Abeel Thomas
Galagan James
Saeys Yvan
Van de Peer Yves
Van Parys Thomas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 17/11/2011
Field of study

Due to ongoing advances in sequencing technologies, billions of nucleotide sequences are now produced on a daily basis. A major challenge is to visualize these data for further downstream analysis. To this end, we present GenomeView, a stand-alone genome browser specifically designed to visualize and manipulate a multitude of genomics data. GenomeView enables users to dynamically browse high volumes of aligned short-read data, with dynamic navigation and semantic zooming, from the whole genome level to the single nucleotide. At the same time, the tool enables visualization of whole genome alignments of dozens of genomes relative to a reference sequence. GenomeView is unique in its capability to interactively handle huge data sets consisting of tens of aligned genomes, thousands of annotation features and millions of mapped short reads both as viewer and editor. GenomeView is freely available as an open source software package

CiteSeerX

Ghent University Academic Bibliography

PubMed Central

Highlights of the BioTM 2010 workshop on advances in bio text mining

Author: Abeel Thomas
Daelemans Walter
Morante Roser
Saeys Yvan
Van Asch Vincent
Van de Peer Yves
Van Landeghem Sofie
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

This meeting report gives an overview of the keynote lectures, the panel discussion and a selection of the contributed presentations. The workshop was held in Gent, Belgium on May 10-11. It featured a tutorial aimed towards a broad audience of (computational) biologists, (computational) linguists and researchers working purely on text mining

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central

Institutional Repository Universiteit Antwerpen